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Abstract. We consider the problem of computing the global infimum of a 
real polynomial / on R n . Every global minimizer of / lies on its gradient 
variety, i.e., the algebraic subset of R n where the gradient of / vanishes. If / 
attains a minimum on M™, it is therefore equivalent to look for the greatest 
lower bound of / on its gradient variety. Nie, Demmel and Sturmfels proved 
recently a theorem about the existence of sums of squares certificates for such 
lower bounds. Based on these certificates, they find arbitrarily tight relaxations 
of the original problem that can be formulated as semidefinite programs and 
thus be solved efficiently. 

We deal here with the more general case when / is bounded from below 
but does not necessarily attain a minimum. In this case, the method of Nie, 
Demmel and Sturmfels might yield completely wrong results. In order to 
overcome this problem, we replace the gradient variety by larger semialgebraic 
subsets of E" which we call gradient tentacles. It now gets substantially harder 
to prove the existence of the necessary sums of squares certificates. 



1. Introduction 

Throughout this article, N := {1, 2, . . . }, R and C denote the sets of natural, real 
and complex numbers, respectively. We fix n £ N, and consider real polynomials 
in n variables X := [X±, . . . , X n ). These polynomials form a commutative ring 

R[X] :=R[X 1 ,...,X n ]. 

1.1. The problem. We consider the problem of computing good approximations 
for the global infimum 

/* := inf{/(a:) | x G R' 1 } elu{-oo} 

of a polynomial / G R[X]. Since /* is the greatest lower bound of /, it is equivalent 
to compute 

(1) /* = sup{a G R | / - a > on R™} G R U {-oo}. 

To solve this hard problem, it has become a standard approach to approximate /* 
by exchanging in (1) the nonnegativity constraint 

(2) / - a > on R n 
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by a computationally more feasible condition and analyze the error caused by this 
substitution. Typically, the choice of this replacement is related to the interplay 
between (globally) nonnegative polynomials, sums of squares of polynomials and 
semidefinite optimization (also called semidefinite programming): 

1.2. Method based on the fact that every sum of squares of polynomials 
is nonnegative (Shor [Sho], Stetsyuk [SS], Parrilo and Sturmfels [PS] et 
al.) We start with the most basic ideas concerning these connections which can be 
found in greater detail in the just cited references. A first try is to replace condition 

(2) by the constraint 

(3) / — a is a sum of squares in the polynomial ring R[X] 

since every sum of squares in R[X] is obviously nonnegative on W 1 . 

The advantage of (3) over (2) is that sums of squares of polynomials can be nicely 
parametrized. Fix a column vector v whose entries are a basis of the vector space 
R[A] d of all real polynomials of degree < d in n variables (d G No := {0} U N). 
This vector has a certain length k = dimR[A] d . It is easy to see that the map 
from the vector space SM. of symmetric k x fc-matrices to R[X] 2d defined by 
M i ► v T Mv is surjective. Using the spectral theorem for symmetric matrices, it 
is not hard to prove that a polynomial / e R[A] 2(J is a sum of squares in M.[X] if 
and only if / = v T Mv for some positive semidefinite matrix M <G SM. . Use the 
following remark which is an easy exercise (write the polynomials as sums of their 
homogeneous parts). 

Remark 1. In any representation / = J2i9i 01 a polynomial / e R[X] 2d as a sum 
of squares gi € R[A], we have necessarily degc/i < d. 

The described parametrization shows that the modified problem (where we ex- 
change (2) by (3)), i.e., the problem to compute 

(4) / sos := sup{a e R | / - a is a sum of squares in R[X]} eRU {-oo} 

can be written as a semidefinite optimization problem (also called semidefinite pro- 
gram or SDP for short), i.e., as the problem of minimizing (or maximizing) an 
affine linear function on the intersection of the cone of positive semidefinite ma- 
trices with an affine subspace in SM. . For solving SDPs, there exist very good 
numerical algorithms, perhaps almost as good as for linear optimization problems. 
Linear optimization can be seen as the restriction of semidefinite optimization to 
diagonal matrices, i.e., a method to minimize an affine linear function on the inter- 
section of the cone R> with an affine subspace of R fe . Speaking very vaguely, most 
concepts from linear optimization carry over to semidefinite optimization because 
every symmetric matrix can be diagonalized. We refer for example to [Tod] for an 
introduction to semidefinite programming. 

Whereas computing /* as defined in (1) is a very hard problem, it is relatively 
easy to compute (numerically to a given precision) / sos defined in (4). Of course, 
the question arises how /* and / sos are related. Since (3) implies (2), it is clear 
that / sos < /*. The converse implication (and thus / sos = /*) holds in some cases: 
A globally nonnegative polynomial 

• in one variable or 

• of degree at most two or 

• in two variables of degree at most four 
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is a sum of squares of polynomials. Wc refer to [Rez] for an overview of these and 
related old facts. However, recently Blckherman has shown in [Blc] that for fixed 
degree d > 4 and high number of variables n only a very small portion (in some 
reasonable sense) of the globally nonnegative polynomials of degree at most d in 
n variables are sums of squares. In particular, / sos will often differ from /*. For 
example, the Motzkin polynomial 

(5) M := X 2 Y 2 (X 2 + Y 2 - 3Z 2 ) + Z 6 £ R[X,Y,Z\. 

is nonnegative but not a sum of squares (see [Rez, PS]). We have M* = but 
M sos = — oo. The latter follows from the fact that M is homogeneous and not a 
sum of squares by the following remark applied to / := M — a for a £ R (which 
can again be proved easily by considering homogeneous parts). 

Remark 2. If / is a sum of squares in R[A], then so is the highest homogeneous 
part (the leading form) of /. 

We see that the basic problem with this method (computing / sos by solving 
an SDP and hoping that / sos is close to /*) is that polynomials positive on R™ 
in general do not have a representation as a sum of squares, a fact that Hilbcrt 
already knew. 

1.3. The Positivstellensatz. In the 17th of his famous of 23 problems, Hilbcrt 
asked whether every (globally) nonnegative (real) polynomial (in several variables) 
was a sum of squares of rational functions. Artin answered this question affirma- 
tively in 1926 and today there exist numerous refinements of his solution. One of 
them is the Positivstellensatz (in analogy to Hilbert's Nullstellensatz) . It is often 
attributed to Stengle [Ste] who clearly deserves credit for finding it independently 
and making it widely known. However, Prestel [PD, Section 4.7] recently discovered 
that Krivine [Kri] knew the result about ten years earlier in 1964. Here we state 
only the following special case of the Positivstellensatz. 

Theorem 3 (Krivine). For every f £ R[A], the following are equivalent, 
(i) f > on R™ 

(ii) There are sums of squares s and t in R[X] such that sf = 1 + t. 

By this theorem, we have of course that /* is the supremum over all a £ R such 
that there are sums of squares s, t £ K.[X] with s(f — a) = 1 + 1. When one tries to 
write this as an SDP there are two obstacles. 

First, each SDP involves matrices of a fixed (finite) size. But with matrices of 
a fixed size, we can only parametrize sums of squares up to a certain degree. We 
need therefore to impose a degree restriction on s and t. There are no (at least 
up to now) practically relevant degree bounds that could guarantee that such a 
restriction would not affect the result. We refer to the tremendous work [Scd] of 
Schmid on degree bounds. This first obstacle, namely the question of degrees of 
the sums of squares, will us accompany throughout the article. The answer will 
always be to model the problem not as a single SDP but as a whole sequence of 
SDPs, each SDP corresponding to a certain degree restriction. As you solve one 
SDP after the other, the degree restriction gets less restrictive and you hope for fast 
convergence of the optimal values of the SDPs to /*. For newcomers in the field, it 
seems at first glance unsatisfactory having to deal with a whole sequence of SDPs 
rather than a single SDP. But after all, it is only natural that a very hard problem 
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cannot be modeled by an SDP of a reasonable size so that you have to look for 
good relaxations of the problem which can easier be dealt with and to which the 
techniques of mathematical optimization can be applied. 

The second obstacle is much more severe. It is the fact that the unknown poly- 
nomial s <E R[X] is multiplied with the unknown a e R on the left hand side of 
the constraint s(f — a) = 1 + 1. This makes the formulation as an SDP (even after 
having imposed a restriction on the degree of s and t) impossible (or at least highly 
non-obvious). Of course, if you fix a e R and a degree bound 2c? for s and t, then 
the question whether there exist sums of squares s and t of degree at most 2d such 
that s(f — a) = 1 + t is equivalent to the feasibility of an SDP. But this plays 
(at least currently) only a role as a criterion that might help to decide whether a 
certain fixed (or guessed) a e R is a strict lower bound of /. We refer to [PS] for 
more details. What one needs are representation theorems for positive polynomials 
that are better suited for optimization than the Positivstellensatz (even if they are 
sometimes less aesthetic). 

1.4. "Big ball" method proposed by Lasserre [LI]. In the last 15 years, a lot 
of progress has been made in proving existence of sums of squares certificates which 
can be exploited for optimization (although most of the new results were obtained 
without having in mind the application in optimization which has been established 
more recently). The first breakthrough was perhaps Schmiidgen's theorem [Sch, 
Corollary 3] all of whose proofs use the Positivstellensatz. In this article, we will 
prove a generalization of Schmiidgen's theorem, namely Theorem 9 below. In [LI], 
Lasserre uses the following special case of Schmiidgen's theorem which has already 
been proved by Cassier [Cas, Thcoreme 4] and which can even be derived easily 
from [Kri, Theoreme 12]. 

Theorem 4 (Cassier). For f E M[X] and R>0, the following are equivalent. 

(i) f > on the closed ball centered at the origin of radius R 
(ii) For all e > 0, there are sums of squares s and t in R[X] such that 

f + e = 3 + t(R 2 -\\X\\ 2 ). 

Here and in the following, we use the notation 

\\Xf:=X 2 + ...+XteR[X}. 

Similar to Subsection 1.2, it can be seen that for any fixed d € No, computing the 
suprcmum over all a € R such that /— a = s+t(R 2 — \\X\\ 2 ) for some sums of squares 
s,t G R[X] of degree at most 2d amounts to solving an SDP. Therefore you get a 
sequence of SDPs parametrized by d € No- Theorem 4 can now be interpreted as a 
convergence result, namely the sequence of optimal values of these SDPs converges 
to the minimum of / on the closed ball around the origin with radius R. If one 
has a polynomial / G R[X] attaining a minimum on R™ and for which one knows 
moreover a big ball on which this minimum is attained, this method is good for 
computing /*. Of course, if you do not know such a big ball in advance you might 
choose larger and larger R. But at the same time you might have to choose a bigger 
and bigger degree restriction d e No and it is not really clear how to get a sequence 
of SDPs that converges to /*. 
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1.5. Lasserre's high order perturbation method [L2]. Recently, Lasserre used 
in [L2] a theorem of Nussbaum from operator theory to prove the following result 
that can be exploited in a similar way for global optimization of polynomials. 

Theorem 5 (Lasserre). For every f £ M.[X], the following are equivalent: 
(i) f > on R n 

(ii) For all e > 0, there is r £ No such that 



Note that (ii) implies that f(x) + £^™=i exp(xj) > for all x £ W 1 and e > 
which in turn implies (i). In condition (ii), r depends on e and /. Using real algebra 
and model theory, Netzer showed that in fact r depends only on e, n, the degree of 
/ and a bound on the size of the coefficients of / [Net, LN]. 

1.6. "Gradient perturbation" method proposed by Jibetean and Laurent 

[JL]. The most standard idea for finding the minimum of a function everybody 
knows from calculus is to compute critical points, i.e., the points where the gradient 
vanishes. It is a natural question whether the power of classical differential calculus 
can be combined with the relatively new ideas using sums of squares. Fortunately, 
it can and the rest of the article will be about how to merge both concepts, sums 
of squares and differential calculus. 

If a polynomial / £ WL[X] attains a minimum in i e R n , i.e., f(x) < f(y) for 
all y £ R n , then the gradient V/ of / vanishes at x, i.e., Vf(x) = 0. However, 
there are polynomials that are bounded from below on W 1 and yet do not attain a 
minimum on R n . The simplest example is perhaps 



for which we have / > on 1" but /* = since lim^-nx, f(x, -) = 0. In the 
following, 



denotes the ideal generated by the partial derivatives of / in R[X}. We call this 
ideal the gradient ideal of /. 

Without going into details, the basic idea of Jibetean and Laurent in [JL] is 
again to apply a perturbation to /. Instead of adding a truncated exponential like 
Lasserre, they just add e X)"=i X^ d+1 ^ for small e > when deg f — 2d. If / > on 
R n , then the perturbed polynomial f £ := f + £||X|| 2 ( d+1 ) is again a sum of squares 
but this time only modulo its gradient ideal (V/ e ). In this case, this is quite easy 
to prove since it turns out that this ideal will be zero-dimensional, i.e., R[X]/(V/ £ ) 
is a finite-dimensional real algebra. We will later see in Theorems 6 and 46 that 
this finite-dimensionality is not needed for the sums of squares representation. But 
the work of Jibetean and Laurent exploits the finite-dimensionality in many ways. 
We refer to [JL] for details. 

1.7. "Gradient variety" method by Nie, Demmel and Sturmfels [NDS]. 
The two perturbation methods just sketched rely on introducing very small coeffi- 
cients in a polynomial. This small coefficients might lead to SDPs which are hard 
to solve because of numerical instability. It is therefore natural to think of another 




i = l k=0 



(6) 
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method which avoids perturbation at all. Nie, Demmels and Sturmfels considered, 
for a polynomial / e R[X], its gradient variety 

V(Vf) := {x e C" | V/ (x) = 0}. 

This is the algebraic variety corresponding to the radical of the gradient ideal (V/). 
It can be shown that a polynomial / e M[X] is constant on each irreducible compo- 
nent of the gradient variety (see [NDS] or use an unpublished algebraic argument 
of Scheiderer based on Kahlcr differentials). This is the key to show that a polyno- 
mial / € M.[X] nonnegativc on its gradient variety is a sum of squares modulo its 
gradient ideal in the case where the ideal is radical. In the general case where the 
gradient ideal is not necessarily radical, the same thing still holds for polynomials 
positive on their gradient variety. The following is essentially [NDS, Theorem 9] 
(confer also the recent work [M2]). We will later prove a generalization of this 
theorem as a byproduct. See Corollary 47 below. 

Theorem 6 (Nie, Demmel and Sturmfels). For every f e R[X] attaining a mini- 
mum on R™, the following are equivalent. 

(i) f > on R" 

(ii) / > on V{Vf) n R n 

(Hi) For all e > 0, there exists a sum of squares s in R[X] such that 

f + ee s + (Vf). 

Moreover, (ii) and (Hi) are equivalent for all f G R[X]. 

For each degree restriction d e No, the problem of computing the supremum 
over all a € R such that 

df df 
f-a = s + pi—^—^ \-p n 



dx 1 ^ n dx n 

for some sum of squares s in R[X] and polynomials p\, . . . ,p n of degree at most 
d, can be expressed as an SDP. Theorem 6 shows that the optimal values of the 
corresponding sequence of SDPs (indexed by d) tend to /* provided that / attains 
a minimum on R n . However, if / docs not attain a minimum on R™, the computed 
sequence still tends to the infimum of / on its gradient variety which might however 
now be very different from /*. Take for example the polynomial / from (6). It is 
easy to see that V(V/) = {0} and therefore the method computes /(0) = 1 instead 
of /* = 0. In [NDS, Section 7], the authors write: 

"This paper proposes a method for minimizing a multivariate poly- 
nomial f(x) over its gradient variety. We assume that the infimum 
/* is attained. This assumption is non-trivial, and we do not ad- 
dress the (important and difficult) question of how to verify that 
a given polynomial f(x) has this property." 

1.8. Our "gradient tentacle" method. The reason why the method just de- 
scribed might fail is that the global infimum of a polynomial / e R[X] is not 
always a critical value of /, i.e., a value that / takes on at least one of its critical 
points in R™. Now there is a well-established notion of generalized critical values 
which includes also the asymptotic critical values (a kind of critical values at infinity 
we will introduce in Definition 12 below). 

In this article, we will replace the real part V r (V/)nR™ of the gradient variety by 
several larger semialgebraic sets on which the partial derivatives do not necessarily 
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vanish but get very small far away from the origin. These scmialgcbraic sets often 
look like tentacles, and that is how we will call them. All tentacles we will consider 
are defined by a single polynomial inequality that depends only on the polynomial 

and expresses that this polynomial gets very small. Given a polynomial / for which 
you want to compute /*, the game will consist in finding a tentacle such that two 
things will hold at the same time: 

• There exist suitable sums of squares certificates for nonnegativity on the 
tentacle. 

• The infimum of / on M™ and on the tentacle coincide. 

One can imagine that these two properties are hardly compatible. Taking 1" as 
a tentacle, would of course ensure the second condition but we have discussed 
in Subsection 1.2 that the first one would be badly violated. The other extreme 
would be to take the empty set as a tentacle. Then the first condition would 
trivially be satisfied whereas the second would fail badly. How we will roughly 
be able to find the balancing act between the two requirements is as follows: The 
second condition will be satisfied by known non-trivial theorems about asymptotic 
behaviour of polynomials at infinity. The existence of suitable sums of squares 
certificates will be based on the author's (real) algebraic work [Sri] on iterated 
rings of bounded elements (also called real holomorphy rings). 

1.9. Contents of the article. The article is organized as follows. In Section 
2, we prove a general sums of squares representation theorem which generalizes 
Schmudgcn's theorem we have mentioned in Subsection 1.4. This representation 
theorem is interesting in itself and will be used in the subsequent sections. In 
Section 3, we introduce a gradient tentacle (see Definition 17) which is defined by 
the polynomial inequality 

||V/|| 2 ||X|| 2 < 1. 

We call this gradient tentacle principal since we can prove that it does the job in 
a large number of cases (see Theorem 25) and there is hope that it works in fact 
for all polynomials / € M.[X] bounded from below. Indeed, we have not found 
any counterexamples (see Open Problem 33). In case this hope were disappointed, 
we present in Section 4 a collection of other gradient tentacles (see Definition 41) 
defined by the polynomial inequalities 

llv/f^i + Hjq 2 )^ 1 < 1 (N e N). 

Their advantage is that if / e R[X] is bounded from below and N is large enough 
for this particular /, then we can prove that the corresponding tentacle does the 
job (see Theorems 46 and 50). We call these tentacles higher gradient tentacles 
since the degree of the defining inequality gets unfortunately high when N gets big 
which has certainly negative consequences for the complexity of solving the SDPs 
arising from these tentacles. However, if / attains a minimum on K™, then any 
choice of N e N will be good. Conclusions are drawn in Section 5. 
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2. The sums of squares representation 

In this section, we prove the important sums of squares representation theorem 
we will need in the following sections. It is a generalization of Schmvidgcn's Positiv- 
stellensatz (see [PD, Sch]) which is also of independent interest. Schmiidgen's result 
is not to confuse with the (classical) Positivstellensatz we described in the intro- 
duction. The connection between the two is that all known proofs of Schmiidgen's 
result use the classical Positivstellensatz. Our result, Theorem 9 below, is much 
harder to prove than Schmiidgen's result. Its proof relies on the theory of iterated 
rings of bounded elements (also called real holomorphy rings) described in [Sri]. 

Definition 7. For any polynomial / G R[X] and subset S C MP, the set S) 

of asymptotic values of / on S consists of all y G K for which there exists a sequence 
(xfe)fc 6 N of points Xk G S such that 

(7) lim ||x fc || = oo and lim f(x k )=y. 

k — >oo k — >oo 

We now recall the important notion of a preordering of a commutative ring. 
Except in the proof of Theorem 9, we need this concept only for the ring R[X]. 

Definition 8. Let A be a commutative ring (with 1). A subset T C A is called 
a preordering if it contains all squares f 2 of elements / G A and is closed under 
addition and multiplication. The preordering generated by gi, . . . , g m € A 

(8) T(g 1 ,...,g m ) = I ssgf 1 . . . gf^ | s s is a sum of squares in A 

[<5€{o,i} m 

is by definition the smallest preordering containing g\, ... , g m . 

If gi,...,g m G WL[X] arc polynomials, then the elements of T(gi, . . . ,g m ) have 
obviously the geometric property that they are nonnegative on the (basic closed 
semialgebraic) set S they define by (9) below. The next theorem is a partial con- 
verse. Namely, if a polynomial satisfies on S some stronger geometric condition, 
then it lies necessarily in T(g\, . . . ,g m ). In case that S is compact, the conditions 
(a) and (b) below are empty and the theorem is Schmiidgen's Positivstellensatz 
(see [PD, Sch]). The more general version we need here is quite hard to prove. 

Theorem 9. Let f,g\,..., g m G R[X] and set 

(9) S:={iel" \gi(x) >0,...,g m (x)>0}. 

Suppose that 

(a) f is bounded on S, 

(b) f has only finitely many asymptotic values on S and all of these are positive, 
i.e., Roo(f,S) is a finite subset o/K>o, and 

(c) f > on S. 

Then f eT(ji,.., fc ). 

Proof. Write i?oo(/, S) — {y\, . . . , y s } C IR >0 and consider the polynomial 

s 

h:=H(f-yi). 

i=l 

This polynomial is "on S small at infinity" by which we mean that for every e > 
there exists k G N such that for all x G S with ||a;|| > k, we have \h(x)\ < e. 
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To show this, assume the contrary. Then there exists e > and a sequence 
(xk)k£N of points Xk £ S with linifc^oo ||xfe|| = oo and 

(10) \h(x k )\ > e for all k £ N. 

Because the sequence {f(xk))keN is bounded by hypothesis (a), we find an infinite 
subset / C N such that the subsequence (f(xk))kei converges. The limit must 
be one of the asymptotic values of / on S, i.e., lini.kei,k—Hx> f{xk) = Hi for some 
i £ {1, . . . , s}. Using (a), it follows that \im.k£i t k^oo h{xk) — contradicting (10). 
Let A := (R[X],T) where T := T(g u . . . , g m ). The set 

H'(A) := {p £ R[X] | N ± p £ T for some N g N} 

is a subring of A (see, e.g, [Sri, Definition 1.2]). We endow H'{A) with the pre- 
ordering T" := T n H'{A) and consider it as also as a preordered ring. By [Sri, 
Corollary 3.7] , the smallness of h at infinity proved above is equivalent to h £ Soo (A) 
in the notation of [Sri]. By [Sri, Corollary 4.17], we have S^A) C H'(A) and 
consequently h £ H'{A). The advantage of H'(A) over A is that its preordering is 
archimcdean, i.e., T" + Z = H'(A). According to an old criterion for an element 
to be contained in an archimedean preordering (see for example [PD, Proposition 
5.2.3 and Lemma 5.2.7] or [Sri, Theorem 1.3]), our claim f £ T' follows if we can 
show that <p(f) > for all ring homomorphisms ip : H'(A) — > M with <p(T') C R>o- 
For all such homomorphisms possessing an extension Cp : A — > M with <p(T) C K>o, 
this follows from hypothesis (c) because it is easy to see that such an extension (p 
must be evaluation p i— > p(x) in the point x := {<p{X\), . . . ,<p{X n )) £ S. Using 
the theory in [Sri], we will see that the only possibility for such a ip not to have 
such an extension Cp is that <p(h) — 0. Then we will be done since <p(ti) = implies 
f{f) — Hi > f° r some i. We have used here that / £ H'(A) which follows from 
h £ H'(A) since H'(A) is integrally closed in A (see [Sri, Theorem 5.3]). 

So let us now use [Sri]. By [Sri, Corollary 3.7 and Theorem 4.18], the smallness 
of h at infinity means that 

A h = H'(A) h 

where we deal on both sides of this equation with the localization of a preordered 
ring by the element h (see [Sri, pages 24 and 25]). If p : H'(A) — > M is a ring 
homomorphism with <p(T') C K> and <p{h) ^ 0, then p extends to a ring homo- 
morphism p : A h = H'(A) h -» M with p(T h ) = ip(T h ) C M> . Then p := <p\ A is 
the desired extension of <p. □ 

Example 10. Consider the polynomials 

(11) h N := 1 - Y N (1 + X) N+1 £ R[X,Y] (N £ N) 

in two variables. We fix N £ N and apply Theorem 9 with / = h^+i, m = 3, 
gi = X, g 2 = Y und g 3 = h^. The set S defined by the gi as in (9) is a subset of 
the first quadrant which is bounded in Y"-direction but unbounded in AT-direction. 
Of course, we have < /ijv < 1 and 

< Y(l+X) < - 1 on S 

showing that is the only asymptotic value of 



1-hw+i = (l-h N )Y(l + X) 
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on S and therefore Roo(hN+i, S) = {1}. It follows also that < /ijv+i < 1 on 5. 
By Theorem 9, we obtain 

(12) h N+1 +££T(X,Y,h N ) 
for all e > 0. 

The following lemma shows that (12) holds even for e — 0, a fact that does not 
follow from Theorem 9. This lemma will be interesting later to compare the quality 
of certain SDP relaxations (see Proposition 49). In its proof, we will explicitly 
construct a representation of /ijv+i as an element of T(X, Y, hjy). Only part of this 
explicit representation will be needed in the sequel, namely an explicit polynomial 
g £ T(X,Y) such that h N+1 £ T(X,Y) + gh N C T(X,Y,h N ). This explains the 
formulation of the statement. Theorem 9 will not be used in the proof but gave us 
good hope before we had the proof. The role of Theorem 9 in this article is above 
all to prove Theorems 25 and 46 below. 



h N +i - (1 + -j-)Y(l+ X)h N £ T(X, Y). 



Lemma 11. For the polynomials Hn defined by (11), we have 

, ~N . 

Proof. For a new variable Z, 

N-l / N-l N-l 

(z - if Y,( N ~ k ) zk = (z -i) 2 [nJ2 zk - z ^2 kz 



fe-i 



k=0 \ k=0 k=l / 

,Z N -1 8 (Z N -\ 



= (z- iy n z- 



Z -l dZ \ Z -l 
= N(Z - 1)(Z N - 1) - Z((Z - \)NZ N - X - {Z N - 1)) 
= Z N+1 - (N + 1)Z + N. 
Specializing Z to z := Y(l + X), we have therefore 

Nh N+1 - (N + l)zh N = N(l - z N+1 (l + X)) - (N + 1)^(1 - z N (l + X)) 
= z N+1 X + (z N+1 - (N + l)z + N) 

N-l 



z N+1 X + (z- if ( N ~ k ) zk e T{X, Y). 



k=0 



Dividing by N — (^J~Nf yields our claim. □ 

3. The principal gradient tentacle 

In this section, we associate to every polynomial / £ R[X] a gradient tentacle 
which is a subset of E" containing the real part of the gradient variety of / and 
defined by a single polynomial inequality whose degree is not more than twice the 
degree of /. The infimum of any polynomial / £ WL[X] bounded from below on 
]R™ will coincide with the infimum on its principal gradient tentacle (see Theorem 
19). Under some technical assumption (see Definition 20) which is not known to be 
necessary (see Open Problem 33), we prove a sums of squares certificate for non- 
negativity of / on its principal gradient tentacle which is suitable for optimization 
purposes. This representation theorem (Theorem 25) is of independent interest and 
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its proof is mainly based on the nontrivial representation theorem from the previ- 
ous section and a result of Parusihski on the behaviour of polynomials at infinity 
([PI, Theorem 1.4]). In Subsection 3.2, we outline how to get a sequence of SDPs 
growing in size whose optimal values tend to /* for any / satisfying the conditions 
of Theorem 25 (or perhaps for any / with /* > — oo if the answer to Open Problem 
33 is yes). In the Subsections 3.3 and 3.4, we give a MATLAB code for the sums 
of squares optimization toolboxes YALMIP [Lof] and SOSTOOLS [PPS] that pro- 
duces and solves these SDP relaxations. This short and simple code is meant for 
readers who have little experience with such toolboxes and want nevertheless try 
our proposed method on their own. In Subsection 3.5, we provide simple examples 
which have been calculated using the YALMIP code from Subsection 3.3. 

We start by recalling the concept of asymptotic critical values developed by 
Rabier in his 1997 milestone paper [Rab]. For simplicity we stay in the setting of 
real polynomials right from the beginning (though part of this theory make sense 
in a much broader context). 

Definition 12. Suppose / £ R[X]. The set Kq(J) of critical values of / consists 
of all y £ R for which there exists x £ R™ such that V/(x) = and f(x) — y. The 
set K(f) of generalized critical values of / consists of all y £ R for which there 
exists a sequence (xk)ken m ^™ such that 

(13) lim ||V/(x fe )||(l + ||x fe ||) = and lim f(x k ) = y. 

k^oo k—>oo 

The set K oa (f) of asymptotic critical values consists of all y £ R for which there 
exists a sequence (xk)keN in R™ such that Hindoo ||irfc|| = oo and (13) hold. 

The following proposition is easy. 

Proposition 13. The set of generalized critical values of a polynomial f £ R[A] 
is the union of its set of critical and asymptotic critical values, i.e., 

K(f) = K (f)UK oo (f). 

The following notions go back to Thom [Tho]. 

Definition 14. Suppose / £ R[Y]. We say that y £ R is a typical value of 
/ if there is neighbourhood U of y in R and a smooth (i.e., C°°) manifold F 
such that f\f-i(u) '■ / _1 (^) ~ > U is a (not necessarily surjective) trivial smooth 
fiber bundle, i.e., there exists a smooth manifold F and a C°° diffcomorphism 
$ : / _1 (f7) — > F x U such that = 7r 2 o $ where ir 2 : F x U — » U is the 

canonical projection. We call y £ R an atypical value of / if it is not a typical 
value of /. The set of all atypical values of / is denoted by B(f) and called the 
bifurcation set of /. 

Note that a $ like in the above definition induces a C°° diffcomorphism f^ 1 (y) — » 
F x {y} — F f° r every y £ U. In this context, the preimages f~ 1 (y) are called 
fibers and F is called the fiber. We do not require that the fiber bundle : 
— > U is surjective (if it is not then the image is necessarily empty). Hence 
the fiber F may be empty and a typical value is not necessarily a value taken on by 
/. We make use the following well-known theorem (see, e.g., [KOS, Theorem 3.1]). 



Theorem 15. Suppose f £ R[X]. Then B(f) C K(f) and K(f) is finite. 
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The advantage of K(f) over Ko(f) is that /* £ K(f) even if / does not attain 
a minimum on R™. This is an easy consequence of Theorem 15. See Theorem 19 
below. 

Example 16. Consider again the polynomial / = (1— XY) 2 +Y 2 £ R[X, Y] from (6) 
that does not attain its infimum /* =0on I 2 . Calculating the partial derivatives, 
it is easy to see that the origin is the only critical point of /. Because / takes 
the value 1 at the origin, we have Ko(f) = {1} and therefore /* = ^ Ko(f). 
Clearly, we have <E B(f) since f~ 1 (—y) =0 7^ f° r small y £ M>o- By 

Theorem 15, we have therefore £ K oa (f) C K(f). To show this directly, a 
first guess would be that ||V/(x, j)||(l + 1(^,^)11) tends to zero when x — > oo 
because lim^oo /(x, -) = 0. But in fact, this expressions tends to 2 when x — > oo. 
However, a calculation shows that lim^oo ||V/(x, j)||(l + ||(x, \ — pr)||) = 0. 



Definition 17. For a polynomial / £ R[AT], we call 

S(V/) :^xeK"| ||V/(x)||||x||<l} 
the 'principal gradient tentacle of /. 

Remark 18. In the definition of S(Vf), the inequality || V/(x)|| ||x|| < 1 could be 
exchanged by || V/(x)|| ||x| < R for some constant i? > 0. Then all subsequent 
results will still hold with obvious modifications. Using an R different from 1 might 
have in certain cases a practical advantage (see Subsection 3.6 below). However, 
we decided to stay with this definition in order to get not too technical and to keep 
the paper readable. 

As expressed by the notation S(Vf), polynomials / with the same gradient V/ 
have the same gradient tentacle, in other words 

S(V(f + a)) = S(Vf) for all a el. 

The first important property of S(Vf) is stated in the following immediate 
consequence of Theorem 15. 

Theorem 19. Suppose f £ R[X] is bounded from below. Then f* £ K{f) and 
therefore f * = inf{/(x) | x £ 5(V/)}. 

Proof. By Theorem 15, it suffices to show that /* £ B(f). Assume that /* ^ B(f), 
i.e., /* is a typical value of /. Then for all y in a neighbourhood of /*, the fibers 
f^ 1 (y) are smoothly diffeomorphic to each other. But this is absurd since f^ 1 (y) 
is empty for y < f* but certainly not empty in a neighbourhood of /*. □ 

Let P" _1 (C) denote the (n— l)-dimensional complex projective space over C. For 
a homogeneous polynomial / and a point z £ P Il_1 (C), we simply say f(z) = to 
express that / vanishes on (a non-zero point of) the straight line z C C". Following 
[PI], we give the following definition. 

Definition 20. We say that a polynomial / £ C[X] has only isolated singularities 
at infinity if / £ C (i.e., / is constant) or d := deg/ > 1 and there are only finitely 
many z £ P™" 1 ^) such that 

(14) %(,)_..._«£(,)_ ^ lW _„ 

where / = J2i fi an d eacn fi <= C[X] is zero or homogeneous of degree i. 
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As shown in [PI, Section 1.1], the geometric interpretation of the above definition 
is that the projective closure of a generic fiber of / has only isolated singularities. 

Remark 21. A generic complex polynomial has only isolated singularities at infinity. 
In fact, much more is true: A generic polynomial / £ C[X] of degree d > 1 has 
no isolated singularities at infinity in the sense that there is no z £ P n_1 (C) such 
that (14) holds. In more precise words, to every d > 2, there exists a complex 
polynomial relation that is valid for all coefficient tuples of polynomials / £ C[X] 
of degree d for which (14) has an infinite number of solutions. This follows from 
the fact that for a generic homogeneous polynomial g £ C[X] of degree d > 1, there 
are only finitely many points z £ P Il_1 (C) such that -§x~{z) = for all i. See [Kus, 
Theoreme II] or [Shu, Proposition 1.1.1]. 

Remark 22. In the two variable case n = 2, every polynomial / £ C[X] has only iso- 
lated singularities at infinity. This is clear since (14) defines an algebraic subvariety 

Of P X (C). 

The following theorem follows easily from [PI, Theorem 1.4]. 

Theorem 23. Suppose f £ M[X] has only isolated singularities at infinity. Then 

Roo(f,S(Vf))CK(f). 

In particular, R oa (f, S(W f)) is finite, i.e., f has only finitely many asymptotic 
values on its principal gradient tentacle. 

Proof. Let (xk)keN be a sequence of points Xk £ S(Vf) and y £ M. such that 
linife^oo \\x k \\ = oo and ]hn.k-Hx f(xk) = V £ K (f). We show that y £ K^f) 
using implication (i) => (ii) in [PI, Theorem 1.4]. Because of our sequence (xk)keN, 
it is impossible that there exists N > 1 and S > such that for all x £ R n with 
sufficiently large and f(x) sufficiently close to y, we have 

\\x\\\\Vf(x)\\>6tfM\. 

This means that condition (ii) in [PI, Theorem 1.4] is violated. The implication 
(i) =>■ (ii) in [PI, Theorem 1.4] yields that y £ B(f) (here we use that y £ K (f)). 
But B(f) C K(f) by Theorem 15. This shows y £ K(f) \ K (f) C K x {f) by 
Proposition 13. □ 

Lemma 24. Every f £ R[X] is bounded on 5(V/). 

Proof. By the Lojasiewicz inequality at infinity [Spo, Theorem 1], there exist c\,ci £ 
N such that for all x £ C", 

|/(x)|> Cl |/(«)|<c 2 ||V/( a; )|||N|. 

Then |/| < max{ci, c 2 } on S(Vf). □ 

3.1. The principal gradient tentacle and sums of squares. Here comes one 
of the main results of this article which is interesting on its own but can later be 
read as a convergence result for a sequence of optimal values of SDPs (Theorem 30 
below) . 

Theorem 25. Let f £ R[X] be bounded from below. Furthermore, suppose that f 
has only isolated singularities at infinity (which is always true in the two variable 
case n = 2) or the principal gradient tentacle S(Vf) is compact. Then the following 
are equivalent. 
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(i) f > on R" 
(ii) f > on 5(V/) 

(mj For every £ > 0, there are sums of squares of polynomials s and t in M.[X] 
such that 

(15) / + e = S + t(l-||V/|| 2 ||X|| 2 ). 

Proof. First of all, the polynomial g := 1 — || V/|| 2 ||X|| 2 is a polynomial describing 
the principal gradient tentacle 

S :={xE R" | g(x) > 0} = S(Vf). 

Because sums of squares of polynomials are globally nonnegative on R™, identity 

(15) can be viewed as a certificate for / > — e on S. Hence it is clear that (iii) 
implies (ii). For the reverse implication, we apply Theorem 9 (with m = 1 and 
g\ := g) to f + e instead of /. We only have to check the hypotheses. Condition (a) 
is clear from Lemma 24. By Theorem 23, we have that i?oo(/, S) is a finite set if 
/ has only isolated singularities at infinity. If 5(V/) is compact, the set Roc(f,S) 
is even empty. Since / > on S by hypothesis, this set contains clearly only 
nonnegative numbers. This shows condition (b), i.e., Roo{f + £,S) = e + Roc(.f, S) 
is a finite subset of R>o- Finally, the hypothesis / > on S gives / + e > on 
S which is condition (c). Therefore (ii) and (iii) are proved to be equivalent. The 
equivalence of (i) and (ii) is an immediate consequence of Theorem 19. □ 

Remark 26. Let / £ R[X] be bounded from below and S(Vf) be compact. Then 
/ attains its infimum /*. To see this, observe that the equivalence of (i) and (ii) 
in the preceding theorem implies 

/* = sup{a £ R | / - a > on R™} 

= sup{a £ M | / - a > on S(V/)} 

= min{/ (x) | x £ S(V/)}. 

The following observation is proved in the same way than Remark 2. 

Remark 27. If / is a sum of squares in the ring of formal power series, then 

its lowest (non- vanishing) homogeneous part must be a sum of squares in R[X]. 

Remark 28. There are polynomials / € K.[X] such that / > on R™ but there 
is no representation (15) for e = 0. To see this, take a polynomial / £ R[X] 
such that / > on 1" but / is not a sum of squares in the ring of formal 

power series (the Motzkin polynomial from (5) is such an example by the preceding 
remark). Then a representation (15) with e = is impossible since the polynomial 
1 — || V/|| 2 ||X|| 2 has a positive constant term and is therefore a square in R[LY]]. 

3.2. Optimization using the gradient tentacle and sums of squares. The- 
orem 25 shows that under certain conditions, computation of /* amounts to com- 
puting the supremum over all a such that / — a = s + t(l — || V/|| 2 ||X|| 2 ) for some 
sums of squares s and t in R[X] . As sketched in the introduction, sums of squares of 
bounded degree can be nicely parametrized by positive semidefinite matrices. This 
motivates the following definition. 

Definition 29. For all polynomials / £ R[X] and all k £ No, we define £ 
R U {±00} as the supremum over all a £ R such that / — a can be written as a sum 

(16) /-a = .s + t(l-||V/|| 2 ||A > || 2 )) 
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where s and t are sums of squares of polynomials with degi < 2k. 

Here and in the following, we use the convention that the degree of the zero 
polynomial is — oo so that t = is allowed in the above definition. Note that when 
the degree of t in (16) is restricted then automatically also the degree of s. 

Therefore the problem of computing can be written as an SDP. How to do 
this, is already suggested in our introduction. It goes exactly like in the well- 
known method of Lasserre for optimization of polynomials on compact basic closed 
semialgebraic sets. We refer to [LI, Ml, Sr2] for the details. There are anyway 
several toolboxes for MATLAB (a software for numerical computation) which can 
be used to create and solve the corresponding SDPs without knowing these details. 
The toolboxes we know are YALMIP [Lof] (which is very flexible and good for much 
more than sums of squares stuff), SOSTOOLS [PPS] (which has a very flexible and 
nice syntax), GloptiPoly [HL] (very easy to use for simple problems) and SparscPOP 
[KKW] (specialized for sparse polynomials). Besides MATLAB and such a toolbox 
one needs also an SDP solver for which the toolbox provides an interface. 

A side remark that we want to make here is that to each SDP there is a dual 
SDP and it is desirable from the theoretical and practical point of view that strong 
duality holds, i.e., the optimal value of the primal and dual SDP coincide. For the 
SDPs arising from Definition 29, strong duality holds. This follows from the fact 
that principal gradient tentacles (unlike gradient varieties) always have non-empty 
interior (they always contain a small neighbourhood of the origin). For a proof 
confer [LI, Theorem 4.2], [Ml, Corollary 3.2] or [Sr2, Corollary 21]. Here we will 
not define the dual SDP nor discuss its interpretation in terms of the so-called 
moment problem. 

Recalling the definition of / sos in (4) , we have obviously 

(17) ,f os < fS < A* </!<••• 

and if / is bounded from below, then all f£ are lower bounds (perhaps — oo) of /* 
by Theorem 19. Note that the technique from Jibetean and Laurent (see Subsection 
1.6 above) gives upper bounds for /* so that it complements nicely our method. It is 
easy to see that Theorem 25 can be expressed in terms of the sequence /o , /*, /| , • • • 
as follows. 

Theorem 30. Let f G R[X] be bounded from below. Suppose that f has only 
isolated singularities at infinity (e.g., n = 2) or the principle gradient tentacle 
S(Vf) is compact. Then the sequence (fl)k^n converges monotonically increasing 

tor. 

The following example shows that it is unfortunately in general not true that 
fl = f* for big k € N. 

Example 31. Let / be the Motzkin polynomial from (5). By Theorem 30, we have 
linifc^oo fk = 0. But it is not true that fk = for some k £ N. By Definition 29, 
this would imply that for all e > 0, there is an identity (15) with sums of squares 
s and t such that degs < k. Because S(Vf) has non-empty interior (note that 
V/(l, 1, 1) = since /(l, 1, 1) = 0), we can use [PS, Proposition 2.6(b)] (see [Sr2, 
Theorem 4.5] for a more elementary exposition) to see that such an identity would 
then also have to exist for e = 0. But this is impossible as we have seen in Remark 
28. 
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Unfortunately, the assumption that / is bounded from below is necessary in 
Theorem 30 as shown by the following trivial example. 

Example 32. Consider / := X £ M.[X] (i.e., let n = 1 and write X instead of X\). 
Then K(f) — 0, S(Vf) = [—1, 1] and (fZ)keN converges monotonically increasing 
to inf{.f(x) | -1 < x < 1} = -1 ■£ -oo = /*. 

Open Problem 33. Do Theorems 25 and 30 hold without the hypothesis that f 
has only isolated singularities at infinity or S(Vf) is compact? 

By the above arguments, it is easy to see that this question could be answered in 
the affirmative if R oc> (f,S('Vf)) were finite for all polynomials / £ R[X] bounded 
from below on W 1 . But this is not true as the following counterexample shows. We 
are grateful to Zbigniew Jelonek for pointing out to us this adaption of an example 
of Parusihski [P2, Example 1.11]. 

Example 34. Consider the polynomial h := X + X 2 Y + X 4 YZ £ R[X,Y,Z], set 
f := h 2 and define for fixed a > the curve 

(l + — ) 

7 : K >0 — ' - — ' ' — — — 





2s 2 J ' 

Observe that ^ 

&(7(*)) = I s + a and dX^^ = 
and therefore /(7(s)) = (f s + a) 2 and 

l|V/|| 2 ( 7 ( S )) = 4/||V^|| 2 ( 7 ( S )) = 4, 4 Q S + a y - + (2a) 2 

It follows that ||V/|| 2 (7(s))||7(s)|| 2 equals 

4s 6 + 16a 2 + (l + 

which tends to (16a 2 + l)a 2 (l/4 + 4a 2 ) for s -> 0. We now see that for s — > 0, ||7(s)|| 
tends to infinity, f(-f(s)) tends to a 2 and, when a is a sufficiently small positive 
number, || V/|| 2 (7(s))||7(s)|| 2 tends to a real number smaller than 1. This shows 
that a 2 £ R 00 (f,S(Vf)) for every sufficiently small positive number a. Hence / is 
an example of a polynomial bounded from below such that i?oo (/, 5(V/)) is infinite. 

3.3. Implementation in YALMIP. We show here how to encode computation 
of /* (as well as of f* x := f sos ) for any k e N with YALMIP. First you have to 
declare the variables appearing in the polynomial / (here x and y) as well as the 
variable a to maximize, 
sdpvar x y a 

Now you specify the polynomial / and the degree bound k (— 1 for computing 
/ sos ). Here we take the dehomogenization / := M{X, Y, 1) where M is the Motzkin 
polynomial introduced in (5). 

f = x~4 * y~2 + x~2 * y~4 - 3 * x~2 * y~2 + 1, k = 

Now compute the partial derivatives with respect to the variables (here x and y) 

and specify the polynomial g defining the gradient tentacle. 

df = jacobian(f, [x y] ) , g = 1 - (df(l)"2 + df(2)~2) * (x~2 + y"2) 
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Define a polynomial variable t of degree < 2k and impose the constraints that t 
and / — a — tg are sums of squares (for some reason the current version of YALMIP 
does here not accept a degree zero polynomial t so that this has to be modeled as 
a scalar variable). 

if k > 

v = monolist([x; y] , 2*k) , coeffVec = sdpvar (length(v) , 1) 
t = coeffVec' * v 

constraints = set(sos(f -a-t * g)) + set(sos(t)) 
elseif k == 

coeffVec = sdpvard, 1) , t = coeffVec 

constraints = set(sos(f -a-t * g)) + set(t > 0) 
else 

coeffVec = [] 

constraints = set(sos(f -a)) 
end 

Now solve the SDP and output the result for a. 

solvesos (constraints , -a, [] , [a; coeffVec]), double(a) 

3.4. Implementation in SOSTOOLS. Below we give an SOSTOOLS code which 
even slightly easier to read but essentially analogous to the YALMIP code. In con- 
trast to the YALMIP code above, the MATLAB Symbolic Math Toolbox is required 
to execute the code below. 

syms x y a t 

f = x~4 * y~2 + x~2 * y~4 - 3 * x~2 * y~2 + 1, k = 

df = jacobian(f, [x y] ) , g = 1 - (df(l)~2 + df(2)~2) * (x~2 + y~2) 

prog = sosprogram( [x; y] , a) 

if k > 

v = monomials ( [x; y] , [0 : k] ) , [prog, t] = sossosvar (prog, v) 

prog = sosineq(prog, f - a - t * g) 
elseif k == 

prog = sosdecvar (prog, t) , prog = sosineq(prog, t) 

prog = sosineq(prog, f - a - t * g) 
else 

prog = sosineq(prog, f - a) 
end 

prog = sossetobj (prog, -a), prog = sossolve (prog) 
sosgetsoKprog, a) 

3.5. Numerical results. The following examples have been computed on an or- 
dinary PC with MATLAB 7, YALMIP 3 and the SDP solver SeDuMi 1.1. Most 
of the computations took a few seconds, some of them a few minutes. The first 
example corresponds exactly to the code in Subsection 3.3. To compute the others, 
the variables, the polynomial / and the degree bound k has to be changed in that 
code. 

Example 35. Let / := M(X, Y, 1) be the dehomogenization of the Motzkin polyno- 
mial M from (5), i.e., / := M(X, Y, 1) = X 4 Y 2 + X 2 Y 4 - 3X 2 Y 2 + 1 e R[X, Y]. 
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We have /* = but / sos = — oo (the latter is an easy exercise). If we execute the 
program from Subsection 3.3 with k = — 1 instead of k = 0, the computer answers 
that the SDP is infeasible which means indeed that / sos = — oo. Executing the 
same program for k = 0, 1, 2 yields /* w -0.0017, ft w -0.0013 and ft w 0.000066 
which is already very close to /* = 0. By Theorem 30, the sequence /o, fi, f 2 , ■ ■ ■ 
converges monotonically to /* = 0. But the computed value ft ~ 0.000066 is 
positive so that there are obviously numerical problems. Confer [PS, Example 2]. 

Example 36. Define / := M(X, 1, Z) e M[X, Z] where M is the Motzkin polyno- 
mial from (5), i.e., / = X* + X 2 + Z 6 - 3X 2 Z 2 e R[X, Z]. Computation yields 
jsos _ _ .1780, ft w -5.1749 • 10" 5 , ft w -1.2520 • 10~ 7 and /| = 8.7662 • 10~ 10 
which "equals numerically" /* = 0. This is in accordance with Theorem 25 which 
guarantees convergence to /* since we are in the two variable case. Confer [PS, 
Example 3]. 

Example 37. Consider the Berg polynomial / := X 2 Y 2 (X 2 + Y 2 - 1) E R[X,Y] 
with global minimum /* = -1/27 attained in (±1/V3, ±l/\/3). We have / sos = 
— oo and running the corresponding program gives indeed an output saying that 
the corresponding SDP is infeasible. The computed optimal values of the first 
principal tentacle relaxations are ft ~ —0.0564, ft w —0.0555, ft ~ —0.0371 and 
ft w -0.0370 w -1/27 = /*. Confer [LI, Example 3], [NDS, Example 3] and [JL, 
Example 4] . 

Example 38. Being a polynomial in two variables of degree at most four, we have 
that for / := (X 2 + l) 2 + (Y 2 + l) 2 - 2(X + Y + l) 2 e R[X, Y], f-f* must be a 
sum of squares (see introduction) whence /* = / sos . By computation, we obtain for 
all values / sos , /q , f* , /| approximately —11.4581. That all these computed values 
are the same can be expected by /* = / sos and the monotonicity (17). Confer [LI, 
Example 2] and [JL, Example 3]. 

Example 39. In [LL], it is shown that 

5 

/ := 11^ ~ X i)^ nXi,X 2 ,X 3 , X 4 , X 5 ] 

i=l j^i 

is nonnegative on R 5 but not a sum of squares of polynomials. Therefore / sos = — oo 
by Remark 2 since / is homogeneous. The SDP solver detects indeed infeasibility 
of the corresponding SDP. We have computed / * w -0.2367, ft w -0.0999 and 
ft w —0.0224. Solving the SDP relaxation computing ft took already the time of 
a coffee break. As in [JL, Example 6], we observe therefore that minimizing / is 
after the change of variables X,- L X\ —Y; l (i = 2, 3, 4, 5) equivalent to minimizing 

5 

h := Y 2 Y 3 Y 4 Y 5 + ^(-^) JJ^ - Y) e R[Y 2 ,Y 3 ,Y 4 , Y 5 }. 

Computing ft, sos results in infeasibility. The numerical results using the principle 
gradient tentacle are h* rs -0.2380, ftj rs -0.0351, h* 2 rs -0.0072, ^ w -0.0019 
and /14 Rj —0.00086285 which is already very close to h* = 0. The condition 
in Theorem 30 is satisfied neither for / nor for h and yet it seems that we have 
convergence to h*. This is a typical observation that might give hope that Open 
Problem 33 has a positive answer. 
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Example 40. Consider once more the polynomial / = (1 — XY) 2 + Y 2 from (6) and 
Example 16 that does not attain its infimum /* = on M 2 . Since this polynomial 
is by definition a sum of squares, we have / sos = = /* and therefore ft = for all 
k E N by (17). By computation, we get / sos w 1.5142 • 1CP 12 which is almost zero 
but also ft w 0.0016, ft w 0.0727 and ft w 0.1317 which shows that there are big 
numerical problems. We have verified that the corresponding SDPs have neverthe- 
less been solved quite accurately. The problem is that small numerical errors in the 
coefficients of a polynomial can perturb its infimum quite a lot whenever the infi- 
mum is not attained (or attained very far from the origin). It should be subject to 
further research how to fight this problem. Anyway, the gradient tentacle method 
still performs in this example much better than the gradient variety method which 
yields the wrong answer 1 (as described in Subsection 1.7 above). The method of 
Jibetean and Laurent gives the best results in this case [JL, Example 5]. 

3.6. Numerical stability. If the coefficients of / and ||V/||||A|| have an order of 
magnitude very different from 1, then the defining polynomial g = 1 — || V/|| 2 || A|| 2 
for the gradient tentacle should be better exchanged by R — || V/|| 2 || X\\ 2 where R 
is a real number of that order of magnitude. This is justified by Remark 18 above. 

Example 40 and other experiments that we did with polynomials bounded from 
below that do not attain a minimum are a bit disappointing and show that for this 
"hard" class of polynomials (exactly the class we were attacking), a lot of work 
remains to be done, at least on the numerical side. The corresponding semidefinite 
programs tend to be numerically unstable. 

For polynomials attaining their minimum, the method in [NDS] is often much 
more efficient, e.g., for Example 39. 

4. Higher gradient tentacles 

In this section, we associate to every polynomial / € WL[X] a sequence of gradient 
tentacles. Each of these is defined by a polynomial inequality just as the principal 
tentacle from Section 3 was. But the degree of this polynomial inequality for the 
iV-th tentacle in this sequence will be roughly 2 N times the degree of /. This has 
the disadvantage that the corresponding SDP relaxations get very big for large N. 
Also, we have to deal for each N with a sequence of SDPs. All in all, we have 
therefore a double sequence of SDPs. The advantage is however that we can prove 
a sums of squares representation theorem (Theorem 46) applicable for all f € M[X] 
bounded from below independently of what is the answer to Open Problem 33. 
Again, we think that this theorem is also of theoretical interest. Implementation of 
the higher gradient tentacle method is analogous to Subsections 3.3 and 3.4. This 
time we do not give numerical examples because of Open Problem 33, Remark 21 
and numerical problems for big N. 

Definition 41. For / e R[X] and N e N, we call 

S(Vf,N) := {x e R n | ||V/(aO|| 2JV (l + IW| 2 ) N+1 < 1} 
the iV-th gradient tentacle of /. 

A trivial fact that one should keep in mind is that || V/(x)|| 2 (l + ||x|| 2 ) < 1 and 
in particular || V/(z)|| ||x|| < 1 for all x G S(Vf,N). This shows that 

V(Vf) n R" C S(V/, 1) C S(V/, 2) C 5(V/, 3) C . . . C S(V/). 
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The definition of 5(V/, N) is motivated by the following definition which is taken 
from [KOS, page 79]. 

Definition 42. Suppose / £ R[X] and JV e N. The set K*(f) consists of all 
y £ R for which there exists a sequence (xk)k&i in R n such that 

(18) lim 113*11=00, lim \\W f(x k )\\\\x k \\ 1+ ^ =0 and lim f(x k ) = y. 

k—>oo k—>oo h — »oo 

Clearly, we have 

KUf)CKl(f)CKl(f)C...C Koo (f). 

The next lemma says that this chain actually gets stationary and reaches K 00 (f). 
For the proof, we refer to [KOS, Lemma 3.1]. 

Lemma 43 (Kurdyka, Orro and Simon). For all f £ R[X], there exists N £ N 
such that 

K oc (f) = K»(f). 

Now we prove for sufficiently large gradient tentacles what was Corollary 19 for 
the principal gradient tentacle (which contains all higher gradient tentacles). 

Theorem 44. Suppose f £ R[X] is bounded from below. Then f* £ K{f) and 
there is No £ N such that for all N > No, 

(19) f*=mf{f(x)\x£S(Vf,N)}. 

Proof. We know already from Theorem 19 that /* £ K(f). By Proposition 13, 
at least one of the following two cases therefore must occur. The first case is that 
/* £ Ko(f). Then /* is attained by / on its gradient variety and therefore on the 
N-th gradient tentacle for actually all N £ N. Hence we can set No := 1. In the 
second case /* £ K oa (f), we can choose some N £ N such that /* £ K^(f ) by 
the previous Lemma. Then /* £ K N i (f) for any N > No. This means that there 
exists a sequence (xk)keN satisfying (18). Therefore || V/(x)|| ||xfc|| 1+1 / Ar < \ and 
consequently 

||V/(* fc )|r (1 + \\x k f) N+1 < \\Vf(x k )f N (2\\x k \\Y +1 < 1 

for all large k since \\x k \\ > 1 and 2 N+1 < 2 2N . This shows that x k £ S(Vf, N) for 
all large k which implies our claim. □ 

The great advantage of the higher gradient tentacles over the principal one is 
that they are always small enough to admit only finitely many asymptotic values, 
i.e., there is no counterpart to Example 34. 

Theorem 45. For every f £ R[X], R aD (f, S(\7 f)) C K 00 (f). In particular, every 
f £ M.[X] has only finitely many asymptotic values on each of its higher gradient 
tentacles, i.e., the set i?oo(/> S(Vf, N)) is finite for all N £ N. 

Proof. Let y £ R be such that (7) holds for some sequence (x fe ) feeN of points 
x k £ S(Wf,N). By Definition 41, 

iiv/oc fc )irikkir< forfc^ 

\\ x k\\ 

implying (13). This shows y £ K OQ (f ). □ 
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4.1. Higher gradient tentacles and sums of squares. We are now able to 
prove the third important sums of squares representation theorem of this article 
besides Theorems 9 and 25. 

Theorem 46. For all f G R[X] bounded from below, there is N n G N such that for 
all N > No, the following are equivalent. 

(i) f > on MP 

(ii) / > on S(V/, N) 

(Hi) For every e > 0, there are sums of squares of polynomials s and t in R[X] 
such that 

(20) / + e = s + t(l - ||V/ \\ 2N (1 + 

Moreover, these conditions are equivalent for all f attaining a minimum on R n and 
all N € N. Finally, (ii) and (Hi) are equivalent for all f G R[X] and N G N. 

Proof. We first show that (ii) and (iii) are always equivalent. To see this, observe 
that 5i := 1 - || V/|| 2JV ||X|| 2JV+2 is a polynomial that defines the set S := {x G 
1" | gi > 0} = S(Vf,N). Because sums of squares of polynomials are globally 
nonnegative on R™, identity (20) can be viewed as a certificate for / > — e on 
S. Hence it is clear that (iii) implies (ii). For the reverse implication, we apply 
Theorem 9 to / + e instead of /. We only have to check the hypotheses. Condition 
(a) is clear from Lemma 24. By Corollary 45, we have that i?oo(/, S) is a finite set. 
Since / > on S by hypothesis, this set contains clearly only nonnegative numbers. 
This shows condition (b), i.e., i?oo(/ + £,£>) = £ + Roc(f, S) is a finite subset of 
R>o- Finally, the hypothesis / > on S gives / + e > on S which is condition 
(c). 

Now suppose that / G M.[X] attains a minimum f(x*) = f* in a point x* G 1™. 
Then V/(x*) = and therefore x* G 5(V/, N) for all N G N. This shows that (i) 
and (ii) arc in this case equivalent for all N G N. 

By what has already been proved, it remains only to show that (i) and (ii) 
are equivalent for large N G N when / G R[X] is bounded from below but docs 
not attain a minimum. But in this case, (19) holds by Theorem 44 yielding the 
equivalence of the first two conditions. □ 

Without needing it for our application, we draw the following immediate corol- 
lary. Taking N = 1 in the second part of this corollary yields Theorem 6 above of 
Nie, Demmel and Sturmfels. 

Corollary 47. Suppose f G R[X] and f > on V(Vf) n M™. Then f + £ is for 
all £ > a sum of squares modulo any principal ideal generated by a power of the 
polynomial ||V/|| 2 (1 + |jX|j 2 ) 7 i.e., for every £ > and N G N, there is a sum of 
squares s in R[X] and a polynomial p G R[X] such that 

f = s+p(\\\7f\\Z(l + \\X\\ 2 )) N . 

In particular, f + e is for all £ > a sum of squares modulo each power of its 
gradient ideal, i.e., for every e > and N G N, there is a sum of squares s in R[X] 
such that 

fes + (Vf) N . 

Proof. The second claim follows from the first one. The first claim follows im- 
mediately from implication (i) ==> (iii) in Theorem 46 which always holds for all 
N G N. □ 
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4.2. Optimization using higher gradient tentacles and sums of squares. 

The following definition can be motivated in the same way than Definition 29 in 
Section 3. 

Definition 48. For all polynomials / g R[X], all TV g N and all k g N , we define 
fx k g R U {±00} as the supremum over all a g R such that / — a can be written 
as a sum 

(21) / - a = s + t(l - || V/|| 2W (1 + IIXH 2 )^ 1 ) 

where s and i are sums of squares of polynomials with degi < 2k. 

Again, like in Section 3 outlined, computation of /jv,/c amounts to solving an 
SDP for each fixed N g N and k g N . Recalling the definition of / sos in (4), we 
have for each fixed TV g N, 

/ SOS : /v.,' /v:' /V. 2- ••• 

and if / is bounded from below, then all k are lower bounds of /* by Theorem 
44. It would be desirable to have also information how the /jv,fe are related to each 
other when not only k but also N varies. All we know about that is the following 
proposition. 

Proposition 49. For all f g R[X], JVeN and k g N , 

/jV+l,fe < /jv,fe+d- 

Proof. Let us define the polynomials ft- at like in (11) and substitute in the identity 
proved in Lemma 11, the polynomials ||V/|| 2 for Y and \\X\\ 2 for X. Then we get 

(22) 1 - || V/|| 2 ( w+1 )(l + IIAH 2 )^ 2 =p + q(l- || V/|| 2Ar (l + || A|| 2 ) w+1 ). 
where p and 

<!■■= (l + £) ||V.f|| 2 (l + ||A|| 2 ) 

are sums of squares of polynomials. The degree of q is no higher than 2(d — 1) + 2 = 
2d. Now if for a g R we have an identity 

/ - a = s + t(l - ||V/|| 2 ^ +1 )(1 + || AH 2 )^ 2 ) 

for sums of squares s and £ with degt < 2k, then for the same a 

f-a=(s + tp)+ tq{\ ||V/|| 2Ar (l + IIAH 2 )^ 1 ) 

and deg(tq) < 2(k + d). □ 

We conclude by interpreting Theorem 46 as a convergence result concerning the 
optimal values f^ k of the proposed relaxations. This is the counterpart to Theorem 
30 from Section 2. 

Theorem 50. For all f g R[A] bounded from below, (f^ k )keN converges mono- 
tonically increasing to f* provided that N g N is sufficiently large (depending on 
f). If f attains a minimum onW 1 , (/jv fe)feeN converges monotonically increasing 
to f* no matter what N g N is. 
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5. Conclusions 

We have proposed a method for computing numerically the infimum of a real 
polynomial in n variables which is bounded from below on M. n . Like in [JL] and 
[NDS], the approach is to find semidefinite relaxations relying on sums of squares 
certificates and critical point theory. As one could expect, polynomials that do not 
attain a minimum on M" (that are either unbounded from below or have a finite 
infimum that is not attained) are particularly hard to handle. In [JL], this prob- 
lem (among others) was solved by perturbing the coefficients of the polynomial to 
guarantee a minimum (in particular, boundedness from below) . Though the results 
in [JL] are quite good, we are convinced that one should also look for other meth- 
ods that avoid perturbations and the danger of numerical ill-conditioning coming 
along with them. Proving sums of squares representations for polynomials positive 
on their gradient variety, it was shown by Nie, Demmel and Sturmfels [NDS] that 
an approach without perturbation is possible. The computational performance of 
their method is extremely good. However, for polynomials that do not attain a 
minimum, their method yields wrong answers. Combining considerable machinery 
from differential geometry and real algebraic geometry, we have shown that part 
of this limitation can be removed. By using our gradient tentacles instead of the 
gradient variety, polynomials that do not attain a minimum but are bounded from 
below can also be handled. Our method has three major problems. First, we do 
not address the important question of how to check efficiently if a polynomial is 
bounded from below. For such polynomials, our method still gives a wrong answer 
(see Example 32). Second, it turns out that solving semidefinite programs that arise 
from a polynomial that does not attain a minimum takes sometimes surprisingly 
long time. And third, small numerical inaccuracies might lead to big changes in the 
infimum of a polynomial if the infimum is not attained. All three problems should 
be subject to further research. Polynomials not attaining a minimum remain hard 
to handle in practice. On the theoretical side, we have combined the theory of 
generalized critical values with the the theory of real holomorphy rings and have 
obtained new interesting characterizations of nonnegative polynomials. 
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